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S.  KullbacK 

THE  ENTROPY  OF  A  MARKOV  INFORMATION  SOURCE 

Let  S  be  a  first-order  MarKov  source  witn  alpnabet 
^  si  »  sa  *  •  •  * »  s, } »  time-nomogeneous  transition  probabilities 
P(si  |  s  )  and  stationary  distribution  p}  *  Prob(s  =  s4  )  , 
i  =  l,2,...,q.  Tnese  define  a  simple  stationary  MarKov 
nnain  witn  tne  matrix  of  transition  probabilities 


(1)  P  = 


p(sa  Is,) 

p(s2|si)  ... 

pK  I  sT ) 

p(  1  s,  ) 

p(s3|s2)  ... 

p(s, !  s2) 

•  f 

p(s1lsa) 

•  • 

P(SJS, )  •  •  • 

•  • 

p( s  Is  ) 

v  q  1  q  ' 

wnere 


(2)  P(si|sj)  +  P(sJsJ  )  +...+  P(sq|3j)  =  1,  J  -  1,2 


f  *  *  *  9  M 


and 


(3)  Pj  =  P1P(sj|si)  +  paP(sJ|sa)  +  ...+  |sq),  J  =  1,  2,  .  .  .  ,  q 


If  we  set  £’  =  (pa  , pa , .  . . , p  )  tnen  In  matrix  notation  tne 
relations  are  expressed  by 


(4)  £»P-£*. 


Pago  P 


I  l'  tne  source  is  in  state  sj  tnen  its  transitions  to 
tne  different  states  ,  j  =  l,?,...,q  form  a  finite  scneme 


5, 


p(Ms,)  p(s2!s! 


PK  Is, ) 


p  ( ' 


(-r 


Tne  entropy  of  tne  finite  scneme  in  (5)  we  write  as 

(6)  H(S|  Sj  )  =  -  z  P(s  ISj)  log  PfsJSj) 

1  =i 

and  may  De  regarded  as  a  measure  of  tne  amount  of  information 
obtained  wnen  tne  source  (Martcov  process)  advances  one  step 
forward,  starting  from  tne  state  s,  .  Tne  mean  value  of  tne 
quantity  in  (6)  over  all  states  s5  ,  tnat  is, 

(7)  H(S|S)  =  1  p  H(S|s  )--|:  $  p  y(s  |s  )log  P(s  |s  ) 

i  =i  i  =i  j  «=i 

=  -  £  $  P(s.  ,s  )  log  PlSl-,-S-J } 

1  =1  J  =1  '  p 

1 

=  -  S:  I  p(s  s  )  log  p(s  s  )  +  f  i:  p(s  ,s  )  log  p 

1  =1  J  =1  1  =1  3  =1 

"  -  £  i  P(sl  ,  s  )  log  P(sJ  ,  s  )  +  £  p  log  p, 

1  *1  )  =1  1  =1 

=  H(SxS)  -  H(S)  =  H(S3)  -  H(S) 


may  De  regarded  as  a  measure  of  tne  mean  amount  of  Information 
obtained  wnen  tne  source  (MarKov  process)  moves  one  step  anead . 
Tne  quantity  H(S|S)  wnicn  we  snail  call  tne  entropy  of  tne  source 
obviously  cnaracterizes  tne  source, 
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and  in  uniquely  determined  by  tne  absolute  probabilities  p 
and  tne  conditional  probabilities  P(s  I  s  ),  1  <  i  ^  q, 

i  <  j  u  q. 

Note  tnat  (7^  may  be  written  as 

(8)  H(S3 )  =  H(S)  +  H(S| S  )  , 

wnicn  is  essentially  a  special  case  of  tne  general  result 

(9)  H(X(l()  =  H(X)  +  H(,j|x)  . 

It  nad  been  snown  tnat  generally 

(10)  H(u)  *  H(l4|x) 

wnicn  in  tnis  case  of  a  MarKov  source  becomes 

(11)  log  q  ;>  H(S)  *  H(S|  S)  . 

If  we  now  consider  sequences  of  (n  +  1)  successive  signals 
wnicn  we  may  also  consider  as  constituted  of  tne  pair  consisting 
of  a  sequence  of  n  signals  and  a  single  signal  tnen  we  nave 

(12)  H(Sn  +  1)  =  H(Sn,S)  =  H(S")  +  H(S|  Sn  )  . 

But  tne  MarKov  cnain  property  tnat  tne  conditional  probability  of 
a  state  depends  only  on  tne  Immediately  preceding  state 


V 
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imp II on  tnat 

(13)  H (  -;.sn)  =  K ( S | S ) 

or 

(14;  H(Sn  +  1  )  =  H(Sn  )  4-  H(S |  S) 

and  ny  successive  application  of  (14)  we  get 

(15)  H(sn  +  1)  =  H(S)  +  nH(S|S),  tnat  is 

(16)  H  [(n  +  1)-  tr.ple]  =  H(single)  +  n  H(transition)  » 

We  may  also  write 

(17)  H(Sn  +  1)  =  H(S,Sn)  =  H(S)  +  H(SB  )  S) 

so  tnat  comparing  (15)  and  (17),  we  see  tnat 

(18)  H(s"  |  S)  =  n  H(S |  S ) 

and 

(19)  H  ( s" + "  |  S )  =  (m  +  n)  H(S  j  S)  =  m  H(S|S)  +  nH(S|S)  = 

=  H(S"  J  S)  +  H ( S”  |S) 


tnat  Is,  tne  entropy  In  (m  +  n)  transitions  is  tne  sum  of  tne 


entropy  In  m  and  n  transitions. 
From  (15)  we  see  tnat 
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(20)  _J_  H(r,n  +  1)  =  -j—  H(r. )  +  -Ji-H(s|s) 

n+1  n+1  n+1 

so  tnat 

(  21 )  lim  _1_  H(Sn  +  1)  =  H  ( S I  S'  , 
n  -00  n+1 

tnat  is,  tne  mean  entropy  per  signal  in  a  long  sequence  of 
signals  is  simply  tne  entropy  of  tne  MarKov  source. 

For  a  source  witnout  memory  (independence)  we  nave  tnat 

(22)  H(S2)  =  2H(S) 

and  for  a  MarKov  (order  1)  source  we  nave  tnat 

(23)  H(S2)  =  H(S)  +  H(S|S)  . 

Since  we  nad  in  ('ll)  tnat  H(S)  H(S|S)<;a  measure  of  tne 
correlation  between  successive  signals  may  De  taKen  as 


(24)  2H(S)  -  F(d)  -  H(S|S)  =  H(S)  -  H(S|S)  =  2H(S)  -  H(S2 ) . 


Note  tnat 
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'?5) 


i-f  p(s,  -v 


log  P(s.’V> 

P  n 
M  1  J 


>  (fZZ  P(s,  ,s  ))  log  21  Ell L 

13  vv  r*  r 


=  0 


o  /  P,  P, 

11  1  J 


and 


(26)  I 


ZZ  P(s  ,s  )  log  P(s  s  )  -  ZJ  P(s  s  )  log  p  -  Z7  P(s  ,s  )  log 

1  j  1  *  1  1  13  13  1  1  J  1  J 

ZZ  P(st  .Sj)  log  P(st  ,sJ  )  -  Z  pt  log  pt  -  j  Pj  log  p( 


=  -  H(S2  )  +  2H(S)  =  H(S)  -  H ( S |  S)  , 

wnere  I  =  0  «  P(s  3  )  =  p  p.  ,  tnat  Is,  no  memory. 

Tne  reader  is  reminded  tnat  all  tne  preceding  was 
relative  to  a  first-order  Markov  source. 

Let  us  now  turn  our  attention  to  tne  case  of  an  m-tn 
order  MarKov  source,  tnat  Is,  tne  conditional  probability 
of  a  signal  value  or  state  depends  only  on  tne  preceding 
m  signal  values  or  states.  As  before  we  nave  tne  relation 
in  (12)  (wnicn  we  repeat  nere) 

(12)  H(Sn  +  M  =  H(S",S)  =  H(Sn)  +  H(S|Sn) 

but  now  tne  specification  of  an  m-tn  order  source  implies  tnat 


(27)  H(S | Sn )  =  H(S|S"  ) 


n  ^  m 
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or 

0'8)  H (Sn  + 1  )  =  H(S"  )  +  H(S|S") 

and  Dy  successive  application  of  (28)  we  get 

(29)  H(Sn  +  1)  =  H(S“  )  +(n  -  m  +  1)  H(S|S"),  n  5  m. 
As  in  (11)  we  may  also  write 

(30)  H(Sn  +  1  )  =  H  ( 3ff  ,  Sn  “  “ 4  1  )  =  H(S“  )  +  H(SB'n+1  |  S"  ) 
so  tnat  comparing  (29)  and  (30)  we  see  tnat 

( 31)  H(Sn“,  +  1  |S-  )  =  (n  -  m  +  1)  H(S|  S"  ) ,  n  ^  m 

and 

(32)  H(S"1  +  ”2  |SB  )  =  (i\  +  nr3)H(S|  S"  )  =  nj  H(S|  S"  )  + 

=  H(Sni  |S"  )  +  H(Sn?  |S"  )  ,  ;>  m 

From  (29)  we  see  tnat 

1  n+i  1  »  n-m+1 

(33)  HTI  H(s"  )  -  ^7  H(eT  )  *  H(S|  E?  )  ,  n 

so  tnat 


rfeH(S|SB  ) 

^  *  m- 

;>  m 


(3^)  lim-JL  H(s”  +  1  )  =  H  ( S I  S*  ), 
n n+1  1 


tnat  Is,  tne  mean  entropy  per  signal  in  a  long  sequence  of  signals 
is  simply  tne  entropy  of  tne  m-tn  order  MarKov  source. 

If  we  use  tne  notation  P(3n )  to  represent  tne  probability 
of  a  sequence  of  n  successive  signals,  tnen  by  tne  convexity 
property 


(35)  I 


Z  z  P(S"  ,S)  log 
S"  s 


1  log  ±  =  0, 


or 


P(S‘,S) 

P(s")P(S) 


Z  Z  P(s‘  ,S) 
3-  S 


log 


p(; 


P(: 


■  o  j 


P(3) 


(36) 


Z  Z  P(S",S)  log  P(S",S)  -  Z  Z  P(S* 
S"  S  S"  S 


=  Z  P(S"  +  1)  log  P(S“  +  I  )  -Z  P(S") 

s»  + 1  S" 


=  -  H(S,  +  1)  +  H(S")  +  H(S)  =  -  H(s") 


3)  log  P(S")-  Z  Z  P(3“  ,  S)  log  P( 3) 
3'  S 

log  P(S"  )  -  Z  P(S)  log  P(S) 

s 

-  H(s|s"  )  +  H(s"  1  +  HfS) 


so  tnat 

(37)  I  =  H(S)  -  H  ( S I  S')  2  0 

witn  I  =  0  «  a  signal  is  independent  of  tne  preceding  rn 
signals,  is  a  measure  of  tne  relation  between  a  signal  and  tne 
preceding  m  signals  in  an  m-tn  order  MarKov  source. 

To  assist  tne  reader  to  relate  tne  exposition  and  notation 
In  tnese  notes  witn  tnat  in  tne  text  by  Abramson  we  indicate 
equivalent  values  and  results  in  tne  following  table 
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Abramson 


GullbacK 


H(S) 

H(S|S) 

H(S) 

H(S|  3*  ) 

H(S) 

H(S) 

H(^) 

H(Sn) 

H(Sn  ) 

h(s"  ;  s) 

H(Sn  ) 

H(S"  |  S* 

(2-?9)  P- 

28 

(11) 

(2-37)  p. 

31 

(18' 

(2-41)  p. 

31,  (2-4o)  P.  31 

(15) 

(2-42)  p. 

31 

(29) 

(2-45)  p. 

32 

(34) 

(2-44)  p. 

32 

(10) 

order  1  source 
order  m  source 

order  1  source 
order  m  source 


wltn  i;=Sn  ,  x=S“  ,  oracr  m  source 


cf.  On  tne  entropy  of  MarKov  cnains  Dy  G.A.  AmDarcumjan 

12v.  AKad.NauK.  Armjan,  SSR  Ser.  Fiz.  Mat.  NauK.  11(1958)  no.  0, 
31~4o'.  Selected  Papers  in  Matn.  Statist  and'  'Probability, 

Vol .  4  (1^3),  pp.  1-11. - 
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EXAMPLE  -  Entropy  MarKov  Onain 


n 

0 

1 

S2 

S3 

S4 

r» 

O 

1 

.  2 

.8 

0 

0 

s 

2 

0 

0 

.1 

.9 

f> 

.0 

0 

0 

.  2 

.8 

S 

4 

.7 

.3 

0 

0 

using  (7) 

H ( S | S )  =  7/24h(  .2,  .8)+l/3H(.9,  .1) 
+1/24h( .2, .8)  +  1/3H( .7,3) 

=7/24 ( .721928  +  1/3 (. 468996) 
+1/24 (.721928) +1/3 ( .881291)  bits 
=  .690738  bits 


Stationary  Probabilities 


•2Pl 

4-  • 

?p4 

II 

M 

•8  ^ 

4-  . 

3p4 

=  p2 

•1P3 

4-  . 

2p3 

=  p3 

•  9P, 

4-  . 

8p3 

=  p4 

-8Pl 

+ 

?p4 

=  0 

Ps 

- 

8p3 

=  0 

+ 

8p3 

=  10p4 

8p3 

= 

p4  -  p3 

7/8p4 

+ 

p4 

+  l/8p4 

p4  =  1/3,  PT  =  7/24 
Ps  =  1/3,  P3  =  1/24 


H(S)  =  7/24  log  24/7  +  1/3  log  3  +  1/24  log  24  +  1/3  log  3 


=  1/3  log  24  +  2/3  log  3  -  7/24  log  7 
=  1/3(4.584962)  +  2/3(1.584962)  -  7/24(2.807355)  bits 
=  1.76615  bits 


a 

s 

m 

n  logj,  n 

sisi 

.2(7/24)  -  14/240 

14 

53.30297 

SI  S8 

.8(7/24)  -  56/240 

56 

325 . 2119 

S1S* 

0 

8 

2h . 0000 

S!S4 

0 

72 

444 . 2346 

Sa  Si 

0 

2 

2.0000 

sasa 

0 

24 

110.0391 

sa  sa 

.1(1/3)  -  8/240 

sas4 

.9(1/3)  -  72/240 

2H(S)  -  H(S”)  = 

S3S! 

s  s 

3  a 

s  s 

3  3 

0 

0 

.2(1/24)  -  2/240 

3.53230 

-2.45689 

1.07^41 

S3S4 

.8(1/24)  -  8/240 

H(S)  -  H(S|S) 

S4S1 

S  S 

4  a 

.7(1/3)  -  56/240 

.3(1/3)  -  24/240 

-  1.76615 
-  .69074 

T.  07541 

s  s 

4  3 

0 

S  S 

4  4 

0 

H(Sa  ) 

14  ,  240  ^  56  ,rt£r  240  ,  8 

240  14  240  56  240 

ice^E  * 
8 

_7S  log  240 
240  72 

+  2  log  202.  +  8  1 og  ^0  +  56 
240  2  240  8  240 

logiilg  + 
56 

S*  logiiiO 
240  24 

-  log  240  - 


1308.0005 

2^0 


-  5.321928  +  2.584962  -  5.45  Bits  -  2.45689  bits 


H(Sa)  -  H(S)  -  2.45689  *  1.76615  -  .69074  Dlts 


